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Introduction 


Introduction 


This Handbook is provided as a reference document for M248. It provides 
a concise summary of the material in the units, a few mathematical 
results, tables of discrete and continuous distributions and their properties, 
and statistical tables for use in inferential procedures. 


1 Unit summaries 


Unit 1 Exploring and interpreting data 


i 


When describing a dataset, the following terminology is used: 


observations (or cases, or sampling units) refer to objects 
(people, countries, ...) on which characteristics are recorded 


variables are the characteristics recorded, and the pattern of 
variation of a variable is its distribution 


variables are linked if they are each recorded for the same 
observations 


a variable is continuous if its values are numerical and all values 
in an interval are possible 


a variable is discrete if its values are numerical but only 
particular values (typically, integers) are possible 


a variable is categorical if its values indicate to which group an 
observation belongs 


a categorical variable is ordinal if its values correspond to labels 
which have a natural ordering 


a categorical variable is nominal if its values correspond to labels 
but the labels do not have a natural ordering. 


Useful graphical representations of data include bar charts, 
histograms, boxplots and scatterplots: 


bar charts are generally used with categorical data, or with 
numerical data that are discrete; side-by-side bar charts can be 
used to display more than one such variable 


histograms are generally used with continuous data; histograms 
come in frequency and unit-area versions which differ only in their 
‘vertical’ scaling; histograms need a reasonably large dataset and 
are sensitive to the choice of cutpoints 


boxplots are also generally used with continuous data; a 
comparative boxplot allows more than one continuous variable to 
be displayed at the same time; boxplots cannot show how many 
modes a distribution has 


scatterplots are used to investigate the relationship between two 
numerical variables (which are often continuous but may be 
discrete). 
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3. A mode in a histogram corresponds to a peak in the heights of the 
bars. The data are unimodal if there is just one mode, bimodal if 
there are two modes and multimodal if there are more than two 
modes. 


4. Numerical data that are not symmetric, in the sense that a bar chart 
or histogram shows a clear lack of symmetry, are said to be skew. If a 
bar chart or histogram has a relatively large ‘tail’ of relatively high 
values, then the data are right-skew; a dataset with a relatively long 
tail of relatively low values is left-skew. 


5. Ifthe n values in a dataset are denoted x1, 2%9,...,%n, then the 
sample mean is 


a 


gitt tEn le 


6. When the data are written in order of increasing size, the pth value is 
denoted £p). In particular: 


e the sample median is m = © (1 (+1) 

e the sample lower quartile is qz = ©(1(n+1)) 
4 

e the sample upper quartile is qy = © (3 (n-+1)) 
4 

e the sample interquartile range is gy — qL. 


7. The sample standard deviation is 





The quantity s* is the sample variance. 


Unit 2 Modelling variation 


1. If an experiment is repeated many times, then the number of times 
that an event E occurs is its sample frequency and the proportion 
of times that E occurs is its sample relative frequency. The 
probability that an event FE occurs, P(E), is the proportion towards 
which the sample relative frequency of E tends as the number of times 
the experiment is repeated increases. 


2. Basic properties of probabilities include: 
e for any event E, 0 < P(E) <1 
e if an event E is impossible, then P(E) = 0 
e if an event E is certain to happen, then P(E) = 1 
e P(E does not occur) = 1 — P(E occurs) 


e for independent events E1, Es,..., Ep, 


P(E: and E» and ... and E= P(E) x P(E2) x joe X P(Er). 


A random variable which takes only integer values is discrete. A 
continuous random variable may take any value within a continuous 
range of values. 


The distribution of a discrete random variable X is given by its 
probability mass function (p.m.f.), p: 


pals P(A =a). 


For all x in the range of X, 0 < p(x) < 1. Also, X` p(x) = 1, where the 
sum is taken over all x in the range of X. 


The distribution of a continuous random variable X is given by its 
probability density function (p.d.f.), f. For all x in the range 
of X, f(x) > 0. Also, f f(x)dx = 1, where the integral is taken over 
all x in the range of X. 


For continuous X, 
T2 
Po cAc) / ta) de. 
ta 


The cumulative distribution function (c.d.f.) of a random 
variable X is 


i) = PCX a): 
For continuous X with range L< x < U, 
Fa)= | ody 
L 
so that 
Piti =< X = £2) = F (2) = Pi ay), 


Unit 3 Models for discrete data 


dhe 


A Bernoulli trial is a single statistical experiment for which there 
are two possible outcomes, often referred to as success and failure and 
denoted by 1 and 0. 


A random variable X with range {0,1} has a Bernoulli distribution 
with parameter p, where 0 < p < 1, if it has p.m.f. 


p(0)=1—p, p(l)=p. 
This is written X ~ Bernoulli(p) and is the distribution of the 
outcome of a Bernoulli trial. 





Two events are independent if the probability of the occurrence of 
one event is unaffected by whether or not the other occurs. 
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A random variable X has a binomial distribution with parameters 
n and p, where 0 < p < 1, if it has p.m-f. 


De) = ("pra =p) 0 Leh 
£ 


This is written X ~ B(n, p). The binomial distribution provides a 
model for the total number of successes in a sequence of n independent 
Bernoulli trials, in which the probability of success in a single trial is p. 


A random variable X has a geometric distribution with parameter 
p, where 0 < p < 1, if it has p.m.f. 


plæj)=(1=p) p, 2x =1,2,3,.... 


This is written X ~ G(p). The geometric distribution provides a 
model for the number of trials up to and including the first success in 
a sequence of independent Bernoulli trials, in which the probability of 
success in each trial is p. The c.d.f. of X is 


F(24)=1-(1-p)*, «=1,2,3,.... 
A random variable X has a Poisson distribution with parameter 4, 
where A > 0, if it has p.m.f. 


—A\& 
p(x) =£ —, &=0,1,2,.... 
T! 





This is written X ~ Poisson(\). The Poisson distribution is the 
limiting distribution of X ~ B(n,A/n) as n becomes large. 


A random variable X has a discrete uniform distribution with 
parameters m and n, where m < n, if it has p.m.f. 

1 
eek 
This random variable is equally likely to take any integer value 
between m and n inclusive. The c.d.f. of X is 
=mi 
a 
A random variable X has a continuous uniform distribution with 
parameters a and b, where a < b, if it has p.d.f. 


f(x) = haa 


This is written X ~ U(a,b). This random variable is equally likely to 
take any value between the two stated bounds. ‘The c.d.f. of X is 


j2)= P= MMA lrei 


L(G) = 


ES dy ean Ts 


a<xa«2<b. 








ra) = C=. 


—a 
b—a’ 


Unit 4 Population means and variances 


L 


The population mean (or mean or expected value or 
expectation) of a random variable is given: 


e if X is discrete with p.m.f. p(x), by 
p= E(X)=)S ep) 


where the sum is taken over all values x in the range of X 


e if X is continuous with p.d.f. f(x), by 


u=E(X)= | 2 f(a) dr 
where the integral is taken over all values x in the range of X. 


The population variance (or variance) of a random variable X is 
given: 


e if X is discrete with p.m.f. p(x), by 
o = V(X) = E[(X - p] = X (2 - »)’p(2), 


T 
where the sum is taken over all values x in the range of X 


e if X is continuous with p.d.f. f(x), by 


? = V(X) = E[(X = p)?] = | (€ = m? (@) ae, 
where the integral is taken over all values x in the range of X 
e whether X is discrete or continuous, also by 
C=V x) Sh yar. 


If X is a random variable and a and b are constants, then the mean 
and variance of Y = aX + b are 


E(Y) =a E(X)+b, V(Y) =a’? V(X). 
If X1, X2,..., Xn are random variables, then 
BUX + Xo+--++ Xn) = (M1) + B( Xo) +--+ + E(Xn). 
If X1, X2,..., Xn are independent random variables, then 
V(X. + Xe +--+ + Xn) = V(X) + V(X2) +--+ + V(X). 


Formulas for the mean and variance of members of the families of 
distributions considered in this module are given in the tables of 
discrete and continuous probability distributions in this Handbook. 
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Unit 5 Events occurring at random and population 
quantiles 


l. 


Poisson’s approximation for rare events states that for large 
values of n and small values of p, the binomial random variable, 

B(n, p), has approximately the same distribution as a Poisson random 
variable with parameter np: 


B(n, p) ~ Poisson(np). 
Equivalently, if u = np, then 
B(n, u/n) & Poisson(u). 
A rough rule for using Poisson’s approximation is that: 


e if nis large and p is small (n > 50 and p < 0.05, say), then the 
approximation is good 


e when p is small enough, the approximation is good even for quite 
small values of n; the smallest value of n for which the 
approximation is good decreases as the value of p decreases. 


A Bernoulli process is a sequence of Bernoulli trials in which: 

e trials are independent 

e the probability of success, p, remains the same from trial to trial. 
For a Bernoulli process: 


e the number of successes in n trials has a binomial distribution 
with parameters n and p 


e the waiting time from after one success up to and including the 
next success has a geometric distribution with parameter p. 


A random variable X has an exponential distribution with 
parameter A, where A > 0, if it has p.d.f. 


f(z) =r" **, ~«>0. 
This is written X ~ M(A). The c.d.f. of X is 
F(z) =1- ee g> 
The Poisson process is a model for the occurrence of events in 
continuous time in which: 
e events occur singly 
e the rate of occurrence of events remains constant 
e the incidence of future events is independent of the past. 


For a Poisson process in which events occur at random at rate A: 


e the number of events that occur during a time interval of length t 
has a Poisson distribution with parameter At 





e the waiting time between successive events has an exponential 
distribution with parameter 4. 


The parallels between the Bernoulli and Poisson processes are 
summarised below. 


Process Type Distribution of Distribution of waiting 





number of events time between events 


Bernoulli discrete binomial geometric 
Poisson continuous Poisson exponential 


For a continuous random variable X with c.d.f. F(x), the a-quantile 
is the value x which is the solution of the equation 


Faea Vea. 
This value is denoted qa. 


For a discrete random variable X with c.d.f. F(x), the a-quantile, 
da, is the smallest value of x in the range of X satisfying F(x) > a. 


The population median, m, lower quartile, qz, and upper 
quartile, gy, are those values of qa corresponding to a = 5, 
respectively. 


AI 


Unit 6 Normal distributions 


I, 


A random variable X has a normal distribution with mean u and 
standard deviation o (and hence variance o°), where o > 0, if it has 
p.d.f. 


EEN. Hee —00 < £ < 00 
= osn p 2 o ! 


This is written X ~ N(u,o?). The normal distribution is symmetric 
about u. 





If a normal distribution is used to model the variation in a population, 
then, according to the model, the proportion of the population within 
k standard deviations of the mean is the same, whatever the values of 
the mean u and the standard deviation o. 


If X ~ N(y,07) and a and b are constants, then 
Y =aX +b~ N(ap+tb,a%o"). 


If X1, X2,..., Xn are independent normally distributed random 
variables with means Į, [lj,..., Hn and variances of, 03,...,07, then 


Y = Xi + X2 + + Xn ~ N(uy + bg te + bn, OF Ho +: +07). 
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Table 1 of the statistical tables 
contains probabilities ®(z) for 
the standard normal 
distribution. 


Table 2 of the statistical tables 
contains quantiles for the 
standard normal distribution. 
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10. 


The normal distribution with mean 0 and standard deviation 1 is 
called the standard normal distribution. The letter Z is used to 
denote the standard normal random variable: Z ~ N(0,1). The c.d-f. 
of Z is denoted ®(z). The standard normal distribution is symmetric 
about 0. It follows that: 


e for any z, 
&(—z) = 1—- (2z) 
e = if dq is the a-quantile of Z, then for any 0 << a < 1, 





da = —d1-a: 
If X ~ N(j1,02), then 
A= 
Za NO, 1), 
(0J 


Conversely, if Z ~ N (0,1), then 
X =oZ4+p~ N(p,0"). 
If X ~ N(,07), then: 


P P(X <a) = P (z < =E) =o (2-4) 








o 
e the a-quantile, x, of X is 
£ = Ola + ps 

where qa is the a-quantile of Z. 
Given a set of n ordered observations £(1), %(2),---,%(m), a normal 
probability plot is produced by plotting the n points (£q), yi), 
i = 1,2,...,n, on a graph, where y; is the quantile qi/(n+1) of a 
standard normal distribution. If the points lie roughly along a straight 


line, then a normal distribution is a plausible model for the variation 
in the data. 


If X is a random variable with mean p and variance oĉ, and if a 
random sample of size n is taken from the distribution of X, then the 
mean and variance of the sample total Ta are 


E(T,) =np, V(Tr) = no’, 


and the mean and variance of the sample mean Xp are 


EX =, VX) = P 
If X1, X2,..., Xn are n independent random observations from a 
population with mean js and finite variance oĉ, then: 


e the Central Limit Theorem states that for large n, the 
distribution of their mean X „ is approximately normal with mean 
u and variance o7/n: 


2 
Xn SN (n Z) 
n 


ET, 


e a corollary to the Central Limit Theorem states that for large n, 
Tn = Xi t+ Xot-+- + Xn & N(np, no’). 


For random samples of size n from a normal distribution with mean p 
and variance o°, the sample total T, and the sample mean X,, are 
exactly normally distributed: 


2 
Ta ~ N(np,no*), Xn~N (n Z) ; 
n 


Unit 7 Point estimation 


I; 


There are many ways of obtaining (point) estimators, that is, 
estimating formulas, for unknown model parameters. When an 
estimating formula is applied in a data context, the resulting number 
provides a (point) estimate of the unknown parameter. 


An estimator 6 is said to be unbiased for a parameter 6 if E(@) = 6. 
An estimator @ is biased if it is not unbiased. The bias of @ is 

E(0) — 0. 

The sample mean X and sample variance Sĉ are unbiased estimators 
of the population mean p and the population variance o°, respectively. 


If X is a discrete random variable with p.m.f. p(x; 6), then the 
likelihood for the random sample 71, %2,...,%n is 


L(0) = p(x1; 0) x p(w; 6) x +++ X plan; 0). 


If X is a continuous random variable with p.d.f. f(x; 0), then the 
likelihood for the random sample £1, %2,...,%n is 


LG) = (Git) Xx a0 XX F(a 0), 


For a given sample of data, the maximum likelihood estimate 6 of 
0 is the value of 0 which maximises the likelihood. This estimate is an 
observation of the corresponding estimator, which is a random 
variable. This estimator is also denoted @ and is called the maximum 
likelihood estimator of 0. Both estimate and estimator are 
abbreviated to MLE. 


Having formed the likelihood L(@), a procedure for calculating the 
MLE @ of 0 that is adequate for the problems discussed in this module 
is: 


e differentiate L(0) to obtain L’(6) 


e solve the equation L’(0) = 0; if there is exactly one solution, then 
set 0 equal to that solution. 


MLEs possess the following properties: 


e they are sometimes unbiased and typically have small bias; also, 
they are asymptotically unbiased, that is, 


—m~ 


E(0) > 0 as n —> co 
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Table 2 of the statistical tables 
contains quantiles for the 
standard normal distribution. 
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e in addition, 


V(@) > 0 as n > o. 


Moreover, for large n, no unbiased estimator of 0 has a smaller 
variance than the MLE. 


Formulas for the MLEs of the parameters of the families of 
distributions considered in this module are given in the tables of 
discrete and continuous probability distributions in this Handbook. 


Unit 8 Interval estimation 


l. 


Given a random sample 71, £2,..., £n of size n from a population with 
mean u, an approximate 100(1 — a)% confidence interval for p, 
valid for large n, is 


(a lS (z- z——, E+ e5) 

y) Jn vn ) 
where Z is the sample mean, s is the sample standard deviation, and 
Z = G1—~(a/2), the (1 — (a/2))-quantile of the standard normal 
distribution. The limits ~~ and u* are, respectively, the lower and 
upper 100(1 — a)% confidence limits for u. This confidence 
interval is sometimes called a z-interval. 


A 100(1 — a)% confidence interval (6~ ,6*) for a population 
parameter 0, calculated from a sample of size n, may be interpreted as 
follows: if a large number of samples of size n were drawn 
independently from the population, and a 100(1 — a)% confidence 
interval calculated on each occasion, then approximately 100(1 — a)% 
of these intervals would contain the true parameter 0. 


Suppose that (7,7) is a 100(1 — a)% confidence interval for u, and 
0 = h(u). If the transformation h is either increasing or decreasing, 
then the limits h(y—) and h(u™) define a 100(1 — a)% confidence 
interval for 0 as follows: 


e if h is increasing, then (0,67) = (h(u7), h(p*)) 
e if h is decreasing, then (6,07) = (h(u™), h(u7)). 
An approximate 100(1 — a)% confidence interval for a 


proportion p, obtained by observing x successes in a sequence of n 
independent Bernoulli trials each with probability of success p, is 


(p-,p') = 7- zi aps z4/ aD) , 


where p = z/n and z is the (1 — (a@/2))-quantile of the standard 
normal distribution. This confidence interval is valid when both np 
and n(1 — p) are at least 5. 


5. Given observations xı and x2 on independent binomial random 
variables X; ~ B(nı, p1) and Xə ~ B(n2, p2), an approximate 
100(1 — a)% confidence interval for the difference d = pı — pə is 


~ JAU-A) RU-P 
(at) = | 3- CP , PG = Pe) 
nı N2 


) 


BY m(1—-%) (1 —D 
a Pi( Pı) Pal Pa) 
nı no 


where D1 = x1 /n1, D2 = x9 /N2, d= D1 — pr and z is the 
(1 — (a/2))-quantile of the standard normal distribution. 


6. An approximate 100(1 — a)% confidence interval for the 
Poisson parameter A is 


oat) = (e-afZa+ ay), 


where % is the sample mean, and z is the (1 — (a@/2))-quantile of the 
standard normal distribution. This confidence interval is valid when 
na is at least 30. 


7. Inarandom sample of size n with sample mean X and sample 
standard deviation S from a normal distribution with mean jp, the 
random variable 


X — 
T = 2 





has a t-distribution with n — 1 degrees of freedom. ‘This is written 
T~t(n—1). 


8. Given a random sample of size n with sample mean % and sample 
standard deviation s from a normal distribution with mean p, a 
100(1 — a)% confidence interval for u is 


(u-,pt) = (e-t,z+t—}], 
Jn yn 
where t is the (1 — (a/2))-quantile of t(n — 1). This confidence interval 
is exact and is sometimes called a t-interval. 


9. Given independent samples of size nı with sample variance s? and ng 
with sample variance s2 from distributions with a common variance, 
the pooled estimate of the common variance is 


(ny — 1)sf + (n2 — 1)s5 


s = 
= ni +n -— 2 


1 Unit summaries 


Table 3 of the statistical tables 
contains quantiles for 
t-distributions. 
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10. If nı and nəs are the sample sizes, and zı and zə are the sample means 


of two independent samples from normal distributions with means u 
and u, and common variance, then an exact 100(1 — a)% confidence 
interval for the difference between the means, d = u4 — H9, iS 


j 1 1 1 1 
(dt) = (31 -m2 -tsp = =, Ti ~T bsp —+—), 
n1 n2 n1 ne 


where t is the (1 — (a/2))-quantile of t(n; + n2 — 2) and sp is the 
pooled estimate of the common standard deviation. This confidence 
interval is exact and is sometimes called a two-sample t-interval. 
The assumption of equal variances is valid if the larger of the two 
sample variances divided by the smaller is less than 3. 





Unit 9 Testing hypotheses 


The main steps in a ‘fixed level’ hypothesis test are: 

e set up the null hypothesis, Ho, and the alternative 
hypothesis, Hı 

e obtain some sample data and summarise these in the test 
statistic 


e obtain the null distribution; this is the distribution of the test 
statistic under the assumption that Ho is true 


e decide on the significance level for the test; the significance 
level is the percentage of tests in which Hop would be rejected 
when it is true and is usually one of 1%, 5% or 10% 





e calculate the critical values for the significance level, and hence 
the rejection region for the test; the latter, defined by the 
former, is the set of extreme values of the test statistic which lead 
to rejection of Ho 


e make one of two possible decisions: 
— reject Ho if the test statistic lies in the rejection region 


— do not reject Ho if the test statistic does not lie in the 
rejection region 


e state the conclusion of the test in non-technical language. 


There are two commonly used tests for testing a population mean, 
u, that is, testing 


Ho : u = Ho 
against one of 


Hj: uF po, or Hy: uw < po, or Hiiu > po. 


1 Unit summaries 


The z-test: 


e can be used whatever the underlying distribution when the 
sample size is large (n > 25) 


e the test statistic is 





pee 
Am Ho 
SIAT 
e the null distribution is N (0, 1) so that critical values are found Table 2 of the statistical tables 
from the N (0,1) quantile table. contains quantiles for the 
standard normal distribution. 
The t-test: 


e can be used for a normal population for any sample size 


e the test statistic is 





X — 
T= Ho 
Siym 
e the null distribution is t(n — 1) so that critical values are found Table 3 of the statistical tables 
from the t(n — 1) quantile table. contains quantiles for 


, : t-distributions. 
To test a population proportion p, test 


Ho : p = po 
against one of 
H; : p £ po, or Hı : p < po, or Hi : p > po. 
The following test can be used when the sample size is large (n > 25): 


e the test statistic is 


= — Po 
Ly = e 
po(1—po) 
n 


e the null distribution is N (0, 1) so that critical values are found 
from the N (0,1) quantile table. 


The main steps in using p-values for testing hypotheses are: 

e set up the null and alternative hypotheses 

e obtain some sample data and summarise these in the test statistic 
e obtain the null distribution of the test statistic 


e identify all other values of the test statistic that are at least as 
extreme, in relation to the null and alternative hypotheses, as the 
value that was observed 


e using the null distribution, calculate the p-value as the 
probability of observing a value of the test statistic at least as 
extreme as the value observed 


e interpret the p-value 


e state the conclusion of the test in non-technical language. 
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The table below provides a rough guide to interpreting p-values. 
p-value Rough interpretation 


p > 0.10 little or no evidence against Ho 
0.05 < p < 0.10 weak evidence against Ho 

0.01 < p< 0.05 moderate evidence against Ho 
p < 0.01 strong evidence against Ho 


Suppose that a hypothesis test results in a p-value p. Then a 
hypothesis test at significance level 100a% would result in: 


e the null hypothesis being rejected if p< a 

e the null hypothesis not being rejected if p > a. 

The conclusions of a hypothesis test may be in error: 

e a Type I error occurs when we reject Ho but it is true then 
significance level = P(Type I error) 

e a Type II error occurs when we do not reject Ho but it is false 


e there is a trade-off between the two error probabilities when 
designing a test. 


The power of a test is 
power = P(reject Ho when Ho is false) = 1 — P(Type II error). 
It is desirable to have large power and small significance level. 


Suppose that a sample of size n is obtained from a population 
distributed as N(u,07), where o? is known, and the test statistic 


= X = Lo 
o/\/n 
is to be used in a test of Hp: u = uo with significance level a. Let 


d >Q. 


e When the alternative hypothesis is Hı : y > Ho and the true value 
of u is uo +d or when the alternative hypothesis is H4 : < uo 
and the true value of u is 4g — d, then the power of the 
one-sided test is 


e When the alternative hypothesis is Hı : y Æ uo and the true value 
of u is uo +d, where d is not small, then the power of the 
two-sided test is approximately 


d 
1-0 (aea TR) 





Ži 


10. Suppose that a sample of data is to be collected and one of the tests 


described in the previous point is to be performed. Suppose also that 
n is to be chosen so that the power of the test, when the true 
underlying mean is [ly + d, is equal to a predetermined value y. The 
required sample size is: 


e for a one-sided test, 
g? 2 
n = qe Gree = qi) 


e for a two-sided test, 
2 
o 
= ap (N1-(a/2) — q)’. 


Unit 10 Nonparametric and goodness-of-fit tests 


iF 


The Wilcoxon signed rank test is a test on a single sample of data, 
L1,02,---,X%n,. Let m denote the underlying population. To test 


Ho : m = mo 
against one of 

Ay 2m SF 17,. OF Hy 2m-< mo, o Him > iig 
the following test can be used: 


e — _ if the data are a set of paired differences, then we are testing 
for a zero median, so mg = 0; set dj = xi, i = 1,2,...,m4 


— for a single sample for which we are testing for a non-zero 
median, mo Æ 0, form the differences from the specified value 
mo: di = Xx; — mig 1 = Le Zann atl 





e in either case, delete any zeros from the dataset of differences and 
let n < nı be the sample size of the dataset with zeros removed 





e rewrite the null and alternative hypotheses with mo = 0 


e without regard to their sign, order the absolute values of the 
differences from least to greatest, and allocate rank 7 to the ith 
absolute difference; in the event of ties, allocate the average rank 
to the tied differences 





e consider again the signs of the original differences; the Wilcoxon 
signed rank test statistic, w+, is the sum of the ranks of the 
positive differences 


e obtain the p-value, p 
e interpret p and state your conclusions. 


Under the null hypothesis, for a sample of size n (excluding any zero 

differences), the Wilcoxon signed rank test statistic W} has mean and 

variance given by 

n(n + 1) 
A ) 


V(W,) = n(n + 1)(2n + 1) 


AV) A 


1 Unit summaries 
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The distribution of 
_ Wy - EW) 
V(W4) 
is approximately standard normal. The approximation is adequate 
provided that n > 16. 
The Mann-Whitney test is a test on two independent samples of 


data. Let @ denote the underlying difference in location between the 
populations from which the samples were drawn. To test 


Ho :£=0 
against one of 

Ay i A0, or Hil <0; or del SU, 
the following test can be used: 


e pool the two samples, keeping track of the sample to which each 
data value belongs 


e order the pooled data values from least to greatest, and allocate 
rank 7 to the ith pooled value; in the event of ties, allocate the 
average rank to the tied values 


e the Mann-Whitney test statistic, u4, is the sum of the ranks for 
one of the samples 


e obtain the p-value, p 
e interpret p and state your conclusions. 


For independent samples of sizes na and npg, the null distribution of 
the Mann-Whitney test statistic U4 has mean and variance given by 


na (na +ngp+1) 


NANB (nA + npg + 1) 
2 ' l 


E(Ua) = 12 


V(Ua) = 


The distribution of 
z Ua — E(UA) 
V(Ua) 


is approximately standard normal. The approximation can be used if 
na > 8 and npg > 8 and there are not too many tied values in the 
pooled dataset. 


The random variable W given by the sum of the squares of 
r independent standard normal random variables has a chi-squared 
distribution with r degrees of freedom. This is written W ~ x?(r). 


Given a random sample for which each observation can be classified 
into one of k distinct categories, the chi-squared goodness-of-fit 
test involves the comparison of the observed frequencies for the 
categories and the frequencies expected under a hypothesised model. 
The test statistic is 

k 


2 (0; - Ej) 
X ~~ 3 E; ) 








1 Unit summaries 


where O; and E; are the observed and expected frequencies for 

category 7. The categories must be chosen in such a way that the 

expected frequency for each category is at least 5. Then, under the 

null hypothesis that the data arise from the hypothesised model, the 

distribution of the test statistic x? is approximately chi-squared with 

k — p — 1 degrees of freedom, where p is the number of parameters 

whose values were estimated from the data. The p-value is given by 

the upper tail probability of x?(k — p — 1) for values exceeding the Table 4 of the statistical tables 


observed test statistic. contains quantiles for 
chi-squared distributions. 








Unit 11 Regression 


L 


When a response variable Y is related to the value of an 
explanatory variable x, then the relationship can be represented by 
a general regression model 


Y; = h(x) +W; i=1,2,...,n 





Here h represents some regression function and the W;s are 
independent random variables with zero mean. 


An important regression model is the (simple) linear regression 
model, where Y depends linearly on x, that is, 
Yat pnt We 1=1,2,...,n 


The line y = a+ px is called the regression line, with parameters a 
being the intercept and 6 the slope. The random terms W; are 
independent with zero mean and constant variance o7. Often, the W; 
are additionally assumed to be normally distributed. 


The following notation is standard: 
\2 
Seo = DO -7 = D m orm 


Sa => i aa Ou 
Sy = Sle —2) (e—9) = ae LEW 


Given data, the parameters of the linear regression eT may be 
estimated using the method of least squares by minimising the sum 
of squared residuals 
nm 
X (yi — (a + Bai))?. 
i=1 


e The least squares estimate of ( is 





p= 
Das 
e The least squares estimate of a is 
â = F — GE. 
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e The least squares line is 
y=at Ba. 

5. The assumption that the W; have constant, zero mean and constant 
variance can be checked using a residual plot in which the residuals 
wi = yi — yj are plotted against the fitted values y; = a+ 6z;. If this 
assumption is satisfied, then the further assumption of normality of 


the W; can be checked using a normal probability plot of the 
residuals. 








6. An unbiased estimator of o? is 
S? — >d. a yy 
m= ~ 
7. Assuming that, independently, W; ~ N(0,07), i = 1,2,...,n, we have 
a n — 2) S2 
B~ N (8, EN, ( — ~ x7(n — 2). 


These two results can be combined to give 


S/V Sri 
Table 3 of the statistical tables This result can be used to test Hp : 6 = 0. 


contains quantiles for 


eS 8. Ifthe random terms W; are normally distributed, then a 
t-distributions. 


100(1 — a)% confidence interval for the parameter @ is 


„Btt 


Grir] 
Dae Sri 
where t is the (1 — (a/2))-quantile of t(n — 2). 


9. Ifthe random terms W; are normally distributed, then, for a given Zo, 
a 100(1 — a)% confidence interval for the mean response is 


Ponte L a — L, + Bry tts E7 
Dna 


where t is the (1 — (a/2))-quantile of t(n — 2). 


10. If the random terms W; are normally distributed, then, for a given 709, 
a 100(1 — a)% prediction interval for the response is 


oe a = +1, ene +41 
Dag 


where t is the (1 — (a/2))-quantile of t(n — 2). 


11. If data comprise observations on p explanatory variables 21, %2,..., £p 
and a response variable Y, then the multiple linear regression 
model can be written 


Yi = a + pitir + Batiz +--+ + PpLip t Wi, 1=1,2,...,0 


The terms W; are independent normal random variables with zero 
mean and constant variance. 


22 


1 Unit summaries 


Unit 12 Transformations and the modelling process 


1. The ladder of powers lists transformations of the form 


—2 —1 


=172 Da 21 A 23 aA 
m E E /2 logg, xi! 


E a aE EE 
When transforming skew, positive data to make them more symmetric, 
and hence more amenable to modelling with a normal distribution: 


e for right-skew data, go down the ladder of powers 
e for left-skew data, go up the ladder of powers. 
2. In linear regression, it is sometimes possible to: 


e straighten out the regression function by transforming the 
explanatory variable 


e make the assumptions associated with the random terms conform 
to those of the linear regression model by transforming the 
response variable. 


3. <A statistical report comprises the following sections: Summary, 
Introduction, Methods, Results, Discussion: 


e the Summary should be self-contained and should be written in 
largely non-technical language; it should state briefly the aim of 
the analysis, the methods used, the key finding(s), and the 
interpretation 


e the Introduction should contain a brief description of the question 
or hypothesis to be investigated, the setting in which the data 
were collected, and the data available 


e the Methods section should include a description of the model, the 
procedures used to check the model, the statistical tests employed, 
the methods used to calculate confidence intervals, and any other 
relevant techniques used 


e the Results section should contain descriptive summaries of the 
data, evidence that the model is appropriate, and the numerical 
results of statistical tests and confidence interval calculations 





e the Discussion should contain your assessment of the statistical 
evidence relating to the original question or hypothesis. 


Unit 13 Applications 


This unit uses techniques from the previous units to solve applied 
problems. 
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2 Some mathematical results 





n n! 
L = —— ., where z!=1x2x...- xx and 0!=1. 
£ z! (n — x)! 
2. Integration rules include: 
e forkÆ-l1, 
k+1 
ki AL 
oe = + 
| (Kk + 1) 


e if g(x), h(x),..., q(x) are any functions of x, then 


JUE) + ha) ++ + a(v)} de 
= | glæ)dz + [ajar ++ | ala) dz 


e if g(x) is any function of x and a is a constant, then 


J as) dx = a | g(x) dz 


e if f(a) is a function such that f f(x)dx = G(x) +c, and z1 < 29, 
then 


3. Differentiation rules include: 
e if f(x) =az*, then f(x) = f’(x) = kaz*-! 


e let g(x), h(x),...,q(x) be any functions of x, and a,b,...,k be 
constants: 


if f(z) =ag(x) + bh(x) +---+kq(z), 
then f'(x) =ag'(x) + bh'(x) +-+-+kq' (x) 
e if f(x) = ae*”, then = kae 
e the chain rule: 
if f(x) = h(g(x)) for suitable functions g and h, 
then f'(x) = g'(x) h'(g(a)) 
e the product rule: 
if f(x) = g(x) x h(x) for any functions g and h, 
then f'(x) = g'(x) h(x) + g(x) h'(x). 
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3 Table of discrete probability distributions 











Name and Probability Cumulative Range Parameter Mean Variance Maximum likelihood 
abbreviation mass function distribution values u p estimator of 
p(x) function F(X) parameter 
Bernoulli p(0) =1-—-p, 0,1 O0<p<l p p(1 — p) 
Bernoulli(p) pL) =o 
. l n = ~ X 
Binomial B(n, p) ( Ja =p" 7 0,1,..., n ATA A ee np np(1 — p) p= — 
i 0<p<1 i 
Discrete uniform 1 hae m <n both n+m (n-—m)(n—m+2) 
— — m,m ++1,... n —— o 
n-m+1 n-m+1 whole numbers 9D 12 
Geometric 1 l= a d 
Lp) rh)" Re RY 0<p<1 = = — 
G(p) (=p p (1 — p) p 7 z P= 
Poisson aD Bo ee 


9c 


4 Table of continuous probability distributions 


Name and 
abbreviation 


Chi-squared 
x” (r) 


Continuous 
uniform 


U(a, b) 


Exponential 
M(A) 


Normal 
N(, 07) 


Standard 


normal 


N(0,1) 


Student’s t 


t(v) 

















Probability Cumulative Range 
density function distribution 
Te) function F(X) 
ZU 
: = a<a<b 
b—a b-a 
l l1—e** T> 
1 Ijz- i T= 
f-3( eJ) o( £) -0 < T <% 
ov 27 2 (OJ O 
; ( : z) P(x) Lis 
—— exp | —-x J) —00 < £ < œ 
V 2T E 


proportional to -o0 < T < 


( A 
1 + — 
V 





Parameter Mean 
values u 
P= I, 2e r 
a+b 
—œ <a <b < œ 5 
1 
A> 0 — 
AÀ 
=00 < U <09, U 
ao >0 
0 
VS le TO 


Variance 
= 
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Maximum likelihood 
estimator(s) of 
parameter (s) 


yooqpueyy 
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Table 1 Probabilities for the standard normal distribution ®(z) = P(Z < z) 


0 j 2 3 4 5 6 7 8 9 
0.0 | 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359 


0.5438 
0.5832 
0.6217 
0.6591 
0.6950 
0.291 
0.7611 
0.7910 
0.8186 


0.8665 
0.8869 
0.9049 
0.9207 
0.9345 
0.9463 
0.9564 
0.9649 
0.9719 


0.9826 
0.9864 
0.9896 
0.9920 
0.9940 
0.9955 
0.9966 
0.9975 
0.9982 


0.9991 
0.9993 
0.9995 
0.9997 
0.9998 
0.9998 
0.9999 
0.9999 
1.0000 


0.5478 
0.5871 
0.6255 
0.6628 
0.6985 
0.7324 
0.7642 
0.7939 
0.8212 


0.8686 
0.8888 
0.9066 
0.9222 
0.9357 
0.9474 
0.9573 
0.9656 
0.9726 


0.9830 
0.9868 
0.9898 
0.9922 
0.9941 
0.9956 
0.9967 
0.9976 
0.9982 


0.9991 
0.9994 
0.9995 
0.9997 
0.9998 
0.9999 
0.9999 
0.9999 
1.0000 


Example: ®(1.58) = 0.9429. 


0.5517 
0.5910 
0.6293 
0.6664 
0.7019 
0.7397 
0.7673 
0.7967 
0.8238 


0.8708 
0.8907 
0.9082 
0.9236 
0.9370 
0.9484 
0.9582 
0.9664 
0.9732 


0.9834 
0.9871 
0.9901 
0.9925 
0.9943 
0.9957 
0.9968 
0.9977 
0.9983 


0.9991 
0.9994 
0.9996 
0.9997 
0.9998 
0.9999 
0.9999 
0.9999 
1.0000 


0.5507 
0.5948 
0.6331 
0.6700 
0.7054 
0.7389 
0.7704 
0.7995 
0.8264 


0.8729 
0.8925 
0.9099 
0.9251 
0.9382 
0.9495 
0.9591 
0.9671 
0.9738 


0.9838 
0.9875 
0.9904 
0.9927 
0.9945 
0.9959 
0.9969 
0.9977 
0.9984 


0.9992 
0.9994 
0.9996 
0.9997 
0.9998 
0.9999 
0.9999 
0.9999 
1.0000 


0.5636 
0.6026 
0.6406 
0.6772 
0.7123 
0.7454 
0.7764 
0.8051 
0.8315 


0.8770 
0.8962 
0.9131 
0.9279 
0.9406 
0.9515 
0.9608 
0.9686 
0.9750 


0.9846 
0.9881 
0.9909 
0.9931 
0.9948 
0.9961 
0.9971 
0.9979 
0.9985 


0.9992 
0.9994 
0.9996 
0.9997 
0.9998 
0.9999 
0.9999 
0.9999 
1.0000 


0.5675 
0.6064 
0.6443 
0.6808 
0.7157 
0.7486 
0.7794 
0.8078 
0.8340 


0.8790 
0.8980 
0.9147 
0.9292 
0.9418 
0.9525 
0.9616 
0.9693 
0.9756 


0.9850 
0.9884 
0.9911 
0.9932 
0.9949 
0.9962 
0.9972 
0.9979 
0.9985 


0.9992 
0.9995 
0.9996 
0.9997 
0.9998 
0.9999 
0.9999 
0.9999 
1.0000 


0.5714 
0.6103 
0.6480 
0.6844 
0.7190 
0.7517 
0.7823 
0.8106 
0.8365 


0.8810 
0.8997 
0.9162 
0.9306 
0.9429 
0.9535 
0.9625 
0.9699 
0.9761 


0.9854 
0.9887 
0.9913 
0.9934 
0.9951 
0.9963 
0.9973 
0.9980 
0.9986 


0.9993 
0.9995 
0.9996 
0.9997 
0.9998 
0.9999 
0.9999 
0.9999 
1.0000 


0.5753 
0.6141 
0.6517 
0.6879 
0.7224 
0.7549 
0.7852 
0.8133 
0.8389 


0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621 


0.8830 
0.9015 
0.9177 
0.9319 
0.9441 
0.9545 
0.9633 
0.9706 
0.9767 


0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817 


0.9857 
0.9890 
0.9916 
0.9936 
0.9952 
0.9964 
0.9974 
0.9981 
0.9986 


0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990 


0.9993 
0.9995 
0.9997 
0.9998 
0.9998 
0.9999 
0.9999 
0.9999 
1.0000 


1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 
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Table 2 Quantiles for the standard normal distribution, ®(qa) = a 


0.00000 
0.02507 
0.05015 
0.07527 
0.1004 
0.1257 
0.1510 
0.1764 
0.2019 
0.2275 
0.2533 
0.2793 
0.3055 
0.3319 
0.3585 
0.3853 
0.4125 


0.4399 
0.4677 
0.4959 
0.5244 
0.5534 
0.5828 
0.6128 
0.6433 
0.6745 
0.7063 
0.7388 
0.7722 
0.8064 
0.8416 
0.8779 
0.9154 
0.9542 





Example: q0.950 = 1.645. 
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Table 3 Quantiles for t-distributions 
df | 0.90 0.95 0.975 0.99 0.995 0.999 


12.71 31.82 63.66 318.3 
4.303 6.965 9.925 22.33 
3.182 4.541 5.841 10.21 
2.776 3.747 4.604 1.173 
2.5/1 3.365 4.032 5.893 
2447 3.143 3.707 5.208 
2.3965 2.998 3.499 4.785 
2.306 2.896 3.355 4.501 
2.202 2.821 3.200 4.297 
2.228 2.764 3.169 4.144 
2:201 24718- 3.106 4.025 
2.179 2.681 3.055 3.930 
2.160 2.650 3.012 3.802 
2.145 2.624 2.977 3.187 
2.131 2.602 2.947 3.133 
2.120 2.5838 2.921 3.686 
2.110 2.567 2.898 3.646 
2.101 2.552 2.878 3.610 
2.093 2.939 2.861 3.019 
2.086 2.528 2.845 3.002 
2.080 2.518 2.831 3.027 
2.074 2.508 2.819 3.905 
2.069 2.500 2.807 3.485 
2.064 2.492 2.797 3.467 
2.060 2.485 2.787 3.450 
2.056 2.479 2.779 3.435 
2.052 2.473 2.771 3.421 
2.048 2.467 2.763 3.408 
2.045 2.462 2.756 3.396 
2.042 2.457 2.750 3.389 
2.040 2.453 2.744 3.379 
2.037 2.449 2.738 3.309 
2.035 2.445 2.733 3.306 
2.032 2.441 2.728 3.348 
2.030 2.438 2.724 3.340 
2.028 2.434 2.719 3.333 
2.026 2.431 2.715 3.326 
2.024 2.429 2.712 3.319 
2.023 2.426 2.708 3.313 
2.021 2.423 2.704 3.307 
2.014 2.412 2.690 3.281 
2.009 2.403 2.678 3.261 
2.004 2.396 2.668 3.245 
2.000 2.390 2.660 3.232 
1.997 2.385 2.654 3.220 
1.994 2.381 2.648 3.211 
1.992 2.377 2.643 3.202 
1.990 2.374 2.639 3.195 
1.988 2.371 2.635 3.189 
1.987 2.368 2.632 3.183 
1.984 2.364 2.626 3.174 





Example: P(T < 2.262) = 0.975, where T ~ t(9). 
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Table 4 Quantiles for y? distributions 


aoas [oo nox [om [or [os [on | om oos oT ow 


0.00004 
0.010 
0.072 
0.207 
0.412 
0.676 
0.989 
1.34 
1.73 
2.16 
2.60 
3.07 
3.00 
4.07 
4.60 
5.14 
5.70 
6.26 
6.84 
TAS 
8.03 
8.64 
9.26 
9.89 
10.52 
11.16 
LSI 
12.46 
13.12 
13.79 
14.46 
15.13 
15.82 
16.50 
17.19 
17.89 
18.59 
19.29 
20.00 
20.71 
24.31 
27.99 
31.73 
30.09 
39.38 
43.28 
47.21 
51.17 
55.17 
59.20 
67.33 


O CONDO HKWN =e 





Example: P(W < 33.92) = 0.95, where W ~ y7(22). 


